informative region
MambaScope: Coarse-to-Fine Scoping for Efficient Vision Mamba
Liu, Shanhui, Xu, Rui, Wang, Yunke
Vision Mamba has emerged as a promising and efficient alternative to Vision Transformers, yet its efficiency remains fundamentally constrained by the number of input tokens. Existing token reduction approaches typically adopt token pruning or merging to reduce computation. However, they inherently lead to information loss as they discard or compress token representations. This problem is further exacerbated when the same fine-grained token processing is uniformly applied across all images regardless of visual complexity. We observe that not all inputs require fine-grained processing: simple images can be effectively handled at a coarse resolution, while only complex ones require refinement. Based on this insight, we propose MambaScope, an adaptive framework for efficient inference for Vision Mamba. MambaScope first performs coarse-grained inference by dividing the input image into large patches, significantly reducing token length and computation. When the model's prediction confidence is low, selected regions are re-processed at a finer resolution to recover essential visual details with minimal additional cost. This dynamic resolution assignment strategy allows MambaScope to allocate computation adaptively according to image complexity, achieving efficient processing without compromising accuracy. Experiments across various vision tasks demonstrate that MambaScope outperforms both the baseline Vision Mamba and state-of-the-art token reduction techniques in terms of accuracy and efficiency.
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
- Asia > China > Liaoning Province > Dalian (0.05)
- Asia > China > Hong Kong (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
Top-$n\sigma$: Not All Logits Are You Need
Tang, Chenxia, Liu, Jianchun, Xu, Hongli, Huang, Liusheng
Large language models (LLMs) typically employ greedy decoding or low-temperature sampling for reasoning tasks, reflecting a perceived trade-off between diversity and accuracy. We challenge this convention by introducing top-$n\sigma$, a novel sampling method that operates directly on pre-softmax logits by leveraging a statistical threshold. Our key insight is that logits naturally separate into a Gaussian-distributed noisy region and a distinct informative region, enabling efficient token filtering without complex probability manipulations. Unlike existing methods (e.g., top-$p$, min-$p$) that inadvertently include more noise tokens at higher temperatures, top-$n\sigma$ maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-$n\sigma$ to better understand its behavior. The extensive experimental results across four reasoning-focused datasets demonstrate that our method not only outperforms existing sampling approaches but also surpasses greedy decoding, while maintaining consistent performance even at high temperatures.
- Asia > China (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
Synthetic Information towards Maximum Posterior Ratio for deep learning on Imbalanced Data
This study examines the impact of class-imbalanced data on deep learning models and proposes a technique for data balancing by generating synthetic data for the minority class. Unlike random-based oversampling, our method prioritizes balancing the informative regions by identifying high entropy samples. Generating well-placed synthetic data can enhance machine learning algorithms accuracy and efficiency, whereas poorly-placed ones may lead to higher misclassification rates. We introduce an algorithm that maximizes the probability of generating a synthetic sample in the correct region of its class by optimizing the class posterior ratio. Additionally, to maintain data topology, synthetic data are generated within each minority sample's neighborhood. Our experimental results on forty-one datasets demonstrate the superior performance of our technique in enhancing deep-learning models.
- North America > United States > Florida > Hillsborough County > Tampa (0.14)
- North America > United States > Wisconsin (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- (9 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.48)
Active Matting
Yang, Xin, Xu, Ke, Chen, Shaozhe, He, Shengfeng, Yin, Baocai Yin, Lau, Rynson
Image matting is an ill-posed problem. It requires a user input trimap or some strokes to obtain an alpha matte of the foreground object. A fine user input is essential to obtain a good result, which is either time consuming or suitable for experienced users who know where to place the strokes. In this paper, we explore the intrinsic relationship between the user input and the matting algorithm to address the problem of where and when the user should provide the input. Our aim is to discover the most informative sequence of regions for user input in order to produce a good alpha matte with minimum labeling efforts. To this end, we propose an active matting method with recurrent reinforcement learning. The proposed framework involves human in the loop by sequentially detecting informative regions for trivial human judgement. Comparing to traditional matting algorithms, the proposed framework requires much less efforts, and can produce satisfactory results with just 10 regions. Through extensive experiments, we show that the proposed model reduces user efforts significantly and achieves comparable performance to dense trimaps in a user-friendly manner. We further show that the learned informative knowledge can be generalized across different matting algorithms.
- Asia > China > Liaoning Province > Dalian (0.05)
- Asia > China > Hong Kong (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
Active Matting
Yang, Xin, Xu, Ke, Chen, Shaozhe, He, Shengfeng, Yin, Baocai Yin, Lau, Rynson
Image matting is an ill-posed problem. It requires a user input trimap or some strokes to obtain an alpha matte of the foreground object. A fine user input is essential to obtain a good result, which is either time consuming or suitable for experienced users who know where to place the strokes. In this paper, we explore the intrinsic relationship between the user input and the matting algorithm to address the problem of where and when the user should provide the input. Our aim is to discover the most informative sequence of regions for user input in order to produce a good alpha matte with minimum labeling efforts. To this end, we propose an active matting method with recurrent reinforcement learning. The proposed framework involves human in the loop by sequentially detecting informative regions for trivial human judgement. Comparing to traditional matting algorithms, the proposed framework requires much less efforts, and can produce satisfactory results with just 10 regions. Through extensive experiments, we show that the proposed model reduces user efforts significantly and achieves comparable performance to dense trimaps in a user-friendly manner. We further show that the learned informative knowledge can be generalized across different matting algorithms.
- Asia > China > Liaoning Province > Dalian (0.05)
- Asia > China > Hong Kong (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
Persistent Monitoring of Stochastic Spatio-temporal Phenomena with a Small Team of Robots
In scenarios such as natural disasters, seasonal agriculture, and other short-duration operations, a rapidly deployable, autonomous mobile sensing system that decides where to take sensor measurements can be more versatile and costeffective than installing stationary sensors. In this work, we are interested in formulating a solution for persistent sensing of real-world stochastic phenomena using a team of mobile robots, even when the underlying covariance structure changes sharply across time, such as sunlight variation in a forest understory (Figure 1). Assuming no prior knowledge on the underlying model of the phenomenon dynamics, this presents two challenges: 1) adapting a belief on the underlying model based on recently observed phenomenon dynamics and 2) correspondingly optimizing the next sensing locations. While exactly modeling stochastic real-world phenomena remains a significant challenge, this work deals mainly with modeling the underlying covariance structure. The underlying covariance structure directly corresponds to information metrics such as entropy, required for evaluating the informativeness or representativeness of sensor readings across a set of locations [9, 13, 30]. Gaussian processes (GP) have emerged as a favored choice for this specific modeling goal primarily because of their nonparametric nature [14, 20, 33, 42].
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > New York (0.04)
- Europe > Germany > Berlin (0.04)
- (4 more...)